<!DOCTYPE html>
<html lang="en-us">
	<head>
		<meta charset="UTF-8">
		<meta http-equiv="X-UA-Compatible" content="IE=Edge">
		<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
		<title>CTW Dataset</title>
		<link rel="shortcut icon" href="favicon.ico">
		<link rel="stylesheet" href="static/css/bootstrap.min.css">
		<link rel="stylesheet" href="static/css/ionicons.min.css">
		<style>
html {
	height: 100%;
}

body {
	height: 100%;
	text-align: justify;
}

pre {
	padding: 9.5px;
	border: 1px solid #ccc;
	border-radius: 4px;
}

.tutorial-item {
	margin: 10px 10px 10px 10px;
	padding: 12px 20px 0 20px;
	border: 1px solid #ccc;
	border-radius: 4px;
}

.tutorial-item:hover {
	box-shadow: 8px 8px 10px rgba(0, 20, 80, 0.1);
}

.footer {
	position: absolute;
	bottom: 0;
	width: 100%;
	background-color: #000;
	color: #fff;
}
		</style>
<!-- Google Tag Manager -->
<script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':
new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],
j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src=
'https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);
})(window,document,'script','dataLayer','GTM-5GTWKX6');</script>
<!-- End Google Tag Manager -->
	</head>
	<body>
		<!-- Google Tag Manager (noscript) -->
		<noscript><iframe src="https://www.googletagmanager.com/ns.html?id=GTM-5GTWKX6"
		height="0" width="0" style="display:none;visibility:hidden"></iframe></noscript>
		<!-- End Google Tag Manager (noscript) -->
		<div style="min-height: 100%">
			<div class="row" style="position: relative; margin: 0; color: #fff; background-color: #000;">
				<div class="col-md-2" style="height: 95px;">
					<div style="position: absolute; top: 15px; left: 5px;">
						<div style="position: relative;">
							<object data="static/img/ttlogo.svg" type="image/svg+xml" height="70"></object>
							<nobr style="position: absolute; top: 10px; left: 77px; font-size: 12px;">TSINGHUA UNIVERSITY - TENCENT</nobr>
							<nobr style="position: absolute; top: 30px; left: 77px; font-size: 20px;">JOINT LABORATORY</nobr>
						</div>
					</div>
				</div>
				<div class="col-md-8">
					<h1 class="text-center" style="margin-top: 18px;">A Large Chinese Text Dataset in the Wild</h1>
					<p class="text-center">Tai-Ling Yuan, Zhe Zhu, Kun Xu, Cheng-Jun Li, Tai-Jiang Mu and Shi-Min Hu</p>
				</div>
			</div>
			<div class="container">
				<p style="margin-top: 15px;">In this paper, we introduce a very large Chinese text dataset in the wild. While optical character recognition (OCR) in document images is well studied and many commercial tools are available, the detection and recognition of text in natural images is still a challenging problem, especially for some more complicated character sets such as Chinese text. Lack of training data has always been a problem, especially for deep learning methods which require massive training data. In this paper, we provide details of a newly created dataset of Chinese text with about 1 million Chinese characters from 3850 unique ones annotated by experts in over 30000 street view images. This is a challenging dataset with good diversity containing planar text, raised text, text under poor illumination, distant text, partially occluded text, etc. Besides the dataset, we give baseline results using state-of-the-art methods for three tasks: character recognition (top-1 accuracy of 80.5%), character detection (AP of 70.9%), and text line detection (AED of 22.1). The dataset, source code, and trained models are publicly available.</p>
				<div class="row">
					<div class="col-md-6">
						<ul style="list-style: none;">
							<li><span class="ion-android-checkbox-outline" style="margin-right: 1ex;"></span><span style="font-size: 125%">32,285</span> high resolution images</li>
							<li><span class="ion-android-checkbox-outline" style="margin-right: 1ex;"></span><span style="font-size: 125%">1,018,402 </span> character instances</li>
							<li><span class="ion-android-checkbox-outline" style="margin-right: 1ex;"></span><span style="font-size: 125%">3,850 </span> character categories</li>
							<li><span class="ion-android-checkbox-outline" style="margin-right: 1ex;"></span><span style="font-size: 125%">6 </span> kinds of attributes</li>
						</ul>
					</div>
					<div class="col-md-6">
						<ul style="list-style: none;">
							<li><span class="ion-link" style="font-size: 125%; margin-right: 1ex;"></span>Homepage: <a href="https://ctwdataset.github.io/">https://ctwdataset.github.io/</a></li>
							<li><span class="ion-link" style="font-size: 125%; margin-right: 1ex;"></span>GitHub: <a href="https://github.com/yuantailing/ctw-baseline">https://github.com/yuantailing/ctw-baseline</a></li>
							<li><span class="ion-link" style="font-size: 125%; margin-right: 1ex;"></span>Laboratory: <a href="http://cg.cs.tsinghua.edu.cn/">http://cg.cs.tsinghua.edu.cn/</a></li>
						</ul>
					</div>
				</div>
				<div class="row">
					<div class="col-md-4" style="padding: 0;">
						<a href="static/img/gt_1044721_0_0_2048_2048.png"><img style="width: 100%; padding: 0; margin: 0;" src="static/img/gt_1044721_0_0_2048_2048.jpg"></a>
					</div>
					<div class="col-md-4" style="padding: 0;">
						<a href="static/img/gt_2004154_0_0_2048_2048.png"><img style="width: 100%; padding: 0; margin: 0;" src="static/img/gt_2004154_0_0_2048_2048.jpg"></a>
					</div>
					<div class="col-md-4" style="padding: 0;">
						<a href="static/img/gt_2005679_0_0_2048_2048.png"><img style="width: 100%; padding: 0; margin: 0;" src="static/img/gt_2005679_0_0_2048_2048.jpg"></a>
					</div>
					<div class="col-md-12 text-center">
						<p>(Click to open image in original resolution)</p>
					</div>
				</div>
				<h4 style="margin-top: 10px;">Tutorial</h4>
				<p>For latest tutorial, please checkout our git repository.</p>
				<div class="row">
					<div class="col-md-4">
						<div class="tutorial-item">
							<h5>Part-1: basics</h5>
							<ul>
								<li>dataset split</li>
								<li>annotation format</li>
								<li>annotation examples</li>
							</ul>
							<p class="text-right"><a href="tutorial/1-basics.html">Lear more &gt;&gt;</a></p>
						</div>
					</div>
					<div class="col-md-4">
						<div class="tutorial-item">
							<h5>Part-2: classification</h5>
							<ul>
								<li>train baseline models</li>
								<li>submission format</li>
								<li>evaluation API</li>
							</ul>
							<p class="text-right" style="font-size: 90%;">(you can find it in git repository)</p>
						</div>
					</div>
					<div class="col-md-4">
						<div class="tutorial-item">
							<h5>Part-3: detection</h5>
							<ul>
								<li>train baseline models</li>
								<li>submission format</li>
								<li>evaluation API</li>
							</ul>
							<p class="text-right" style="font-size: 90%;">(you can find it in git repository)</p>
						</div>
					</div>
				</div>
				<h4 style="margin-top: 10px;">Files</h4>
				<ul style="list-style: none;">
					<li><a href="http://jcst.ict.ac.cn/EN/10.1007/s11390-019-1923-y"><span class="ion-ios-download-outline" style="margin-right: 1ex;"></span>Paper</a></li>
					<li><a href="https://1drv.ms/b/s!Al-inEPeCzeQgat9X4bO5FYqMLAPVg?e=llCWLA"><span class="ion-ios-download-outline" style="margin-right: 1ex;"></span>Supplementary materials</a></li>
					<li><a href="downloads.html"><span class="ion-ios-download-outline" style="margin-right: 1ex;"></span>Dataset</a></li>
					<li><a href="https://github.com/yuantailing/ctw-baseline"><span class="ion-ios-download-outline" style="margin-right: 1ex;"></span>Baseline code</a></li>
					<li><a href="downloads.html"><span class="ion-ios-download-outline" style="margin-right: 1ex;"></span>Trained models</a></li>
				</ul>
				<h4 style="margin-top: 10px;">Evaluation Server</h4>
				<ul>
					<li>The evaluation server is available on <a href="https://competitions.codalab.org/competitions/?q=CTW">CodaLab</a>.</li>
					<li>You should submit a <code>.zip</code> file, which contains one <code>.jsonl</code> file in the top-level directory. Submission formats and evaluation metrics for classification task and detection task are described in tutorial part-2 and part-3, respectively.</li>
					<li>Sample submissions can be downloaded from <i>"public submissions"</i> of corresponding competition on CodaLab. You may need to login to CodaLab before downloading.</li>
					<li>Detailed results are provided in the <i>"view detailed results"</i> link for each submission.</li>
				</ul>
				<h4 style="margin-top: 10px;">Contact</h4>
				<div style="margin: 0 30px 0 30px;">
					<p><span class="ion-email" style="margin-right: 1ex;"></span>If you have any questions about the dataset or code, please contact Tai-Ling Yuan (yuantailing[at]gmail.com).</p>
					<p><span class="ion-compose" style="margin-right: 1ex;"></span>Bibtex:</p>
					<pre>@article{yuan2019ctw,
  author  = {Tai{-}Ling Yuan and Zhe Zhu and Kun Xu and Cheng{-}Jun Li and Tai{-}Jiang Mu and Shi{-}Min Hu},
  title   = {A Large Chinese Text Dataset in the Wild},
  journal = {Journal of Computer Science and Technology},
  volume  = {34},
  number  = {3},
  pages   = {509--521},
  year    = {2019},
}</pre>
				</div>
				<h4 style="margin-top: 10px;">Change Log</h4>
				<ul>
					<li><span style="font-family: SFMono-Regular,Menlo,Monaco,Consolas;">06/17/2019 (GMT+8):</span> replace the paper with <i>A Large Chinese Text Dataset in the Wild</i></li>
					<li><span style="font-family: SFMono-Regular,Menlo,Monaco,Consolas;">07/04/2018 (GMT+8):</span> dataset moved to OneDrive</li>
					<li><span style="font-family: SFMono-Regular,Menlo,Monaco,Consolas;">03/17/2018 (GMT+8):</span> evaluation server available</li>
					<li><span style="font-family: SFMono-Regular,Menlo,Monaco,Consolas;">03/15/2018 (GMT+8):</span> dataset released on WeiYun and Google Drive</li>
					<li><span style="font-family: SFMono-Regular,Menlo,Monaco,Consolas;">02/28/2018 (GMT+8):</span> website comes online</li>
				</ul>
				<h4>Terms of Use</h4>
				<div>
					<ul style="font-size: 90%;">
						<li>The public annotations and trained models belong to the CSCG Group and are licensed under the <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.</li>
						<li>The images belong to Tencent ltd. and are licensed under the <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.</li>
						<li>Most of the baseline code belongs to Tai-Ling Yuan and is licensed under the <a href="https://mit-license.org/">MIT License</a>.</li>
					</ul>
				</div>
			</div>
			<div style="height: 40px;"></div>
		</div>
		<div style="position: relative; height: 0;">
			<div class="footer">
				<div class="text-center">Copyright &copy; 2018 CSCG Group</div>
			</div>
		</div>
	</body>
</html>
